152 research outputs found
Towards Persistence-Based Reconstruction in Euclidean Spaces
Manifold reconstruction has been extensively studied for the last decade or
so, especially in two and three dimensions. Recently, significant improvements
were made in higher dimensions, leading to new methods to reconstruct large
classes of compact subsets of Euclidean space . However, the complexities
of these methods scale up exponentially with d, which makes them impractical in
medium or high dimensions, even for handling low-dimensional submanifolds. In
this paper, we introduce a novel approach that stands in-between classical
reconstruction and topological estimation, and whose complexity scales up with
the intrinsic dimension of the data. Specifically, when the data points are
sufficiently densely sampled from a smooth -submanifold of , our
method retrieves the homology of the submanifold in time at most ,
where is the size of the input and is a constant depending solely on
. It can also provably well handle a wide range of compact subsets of
, though with worse complexities. Along the way to proving the
correctness of our algorithm, we obtain new results on \v{C}ech, Rips, and
witness complex filtrations in Euclidean spaces
Rates of convergence for robust geometric inference
Distances to compact sets are widely used in the field of Topological Data
Analysis for inferring geometric and topological features from point clouds. In
this context, the distance to a probability measure (DTM) has been introduced
by Chazal et al. (2011) as a robust alternative to the distance a compact set.
In practice, the DTM can be estimated by its empirical counterpart, that is the
distance to the empirical measure (DTEM). In this paper we give a tight control
of the deviation of the DTEM. Our analysis relies on a local analysis of
empirical processes. In particular, we show that the rates of convergence of
the DTEM directly depends on the regularity at zero of a particular quantile
fonction which contains some local information about the geometry of the
support. This quantile function is the relevant quantity to describe precisely
how difficult is a geometric inference problem. Several numerical experiments
illustrate the convergence of the DTEM and also confirm that our bounds are
tight
Data driven estimation of Laplace-Beltrami operator
Approximations of Laplace-Beltrami operators on manifolds through graph
Lapla-cians have become popular tools in data analysis and machine learning.
These discretized operators usually depend on bandwidth parameters whose tuning
remains a theoretical and practical problem. In this paper, we address this
problem for the unnormalized graph Laplacian by establishing an oracle
inequality that opens the door to a well-founded data-driven procedure for the
bandwidth selection. Our approach relies on recent results by Lacour and
Massart [LM15] on the so-called Lepski's method
Focus in Ewe
International audience—In this paper, a strides detection algorithm is proposed using inertial sensors worn on the ankle. This innovative approach based on geometric patterns can detect both normal walking strides and atypical strides such as small steps, side steps and backward walking that existing methods struggle to detect. It is also robust in critical situations, when for example the wearer is sitting and moving the ankle, while most algorithms in the literature would wrongly detect strides
Optimal rates of convergence for persistence diagrams in Topological Data Analysis
Computational topology has recently known an important development toward
data analysis, giving birth to the field of topological data analysis.
Topological persistence, or persistent homology, appears as a fundamental tool
in this field. In this paper, we study topological persistence in general
metric spaces, with a statistical approach. We show that the use of persistent
homology can be naturally considered in general statistical frameworks and
persistence diagrams can be used as statistics with interesting convergence
properties. Some numerical experiments are performed in various contexts to
illustrate our results
The density of expected persistence diagrams and its kernel based estimation
Extended version of the SoCG proceedings, submitted to a journalInternational audiencePersistence diagrams play a fundamental role in Topological Data Analysis where they are used as topological descriptors of filtrations built on top of data. They consist in discrete multisets of points in the plane R 2 that can equivalently be seen as discrete measures in R 2. When the data come as a random point cloud, these discrete measures become random measures whose expectation is studied in this paper. First, we show that for a wide class of filtrations, including the Čech and Rips-Vietoris filtrations, the expected persistence diagram, that is a deterministic measure on R 2 , has a density with respect to the Lebesgue measure. Second, building on the previous result we show that the persistence surface recently introduced in [Adams & al., Persistenceimages: a stable vector representation of persistent homology] can be seen as a kernel estimator of this density. We propose a cross-validation scheme for selecting an optimal bandwidth, which is proven to be a consistent procedure to estimate the density
Optimal quantization of the mean measure and application to clustering of measures
This paper addresses the case where data come as point sets, or more generally as discrete measures. Our motivation is twofold: first we intend to approximate with a compactly supported measure the mean of the measure generating process, that coincides with the intensity measure in the point process framework, or with the expected persistence diagram in the framework of persistence-based topological data analysis. To this aim we provide two algorithms that we prove almost minimax optimal. Second we build from the estimator of the mean measure a vectorization map, that sends every measure into a finite-dimensional Euclidean space, and investigate its properties through a clustering-oriented lens. In a nutshell, we show that in a mixture of measure generating process, our technique yields a representation in , for that guarantees a good clustering of the data points with high probability. Interestingly, our results apply in the framework of persistence-based shape classification via the ATOL procedure described in \cite{Royer19}
Stochastic Convergence of Persistence Landscapes and Silhouettes
Persistent homology is a widely used tool in Topological Data Analysis that
encodes multiscale topological information as a multi-set of points in the
plane called a persistence diagram. It is difficult to apply statistical theory
directly to a random sample of diagrams. Instead, we can summarize the
persistent homology with the persistence landscape, introduced by Bubenik,
which converts a diagram into a well-behaved real-valued function. We
investigate the statistical properties of landscapes, such as weak convergence
of the average landscapes and convergence of the bootstrap. In addition, we
introduce an alternate functional summary of persistent homology, which we call
the silhouette, and derive an analogous statistical theory
PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures
Persistence diagrams, the most common descriptors of Topological Data
Analysis, encode topological properties of data and have already proved pivotal
in many different applications of data science. However, since the (metric)
space of persistence diagrams is not Hilbert, they end up being difficult
inputs for most Machine Learning techniques. To address this concern, several
vectorization methods have been put forward that embed persistence diagrams
into either finite-dimensional Euclidean space or (implicit) infinite
dimensional Hilbert space with kernels. In this work, we focus on persistence
diagrams built on top of graphs. Relying on extended persistence theory and the
so-called heat kernel signature, we show how graphs can be encoded by
(extended) persistence diagrams in a provably stable way. We then propose a
general and versatile framework for learning vectorizations of persistence
diagrams, which encompasses most of the vectorization techniques used in the
literature. We finally showcase the experimental strength of our setup by
achieving competitive scores on classification tasks on real-life graph
datasets
High-Dimensional Topological Data Analysis
International audienceModern data often come as point clouds embedded in high dimensional Euclidean spaces, or possibly more general metric spaces. They are usually not distributed uniformly, but lie around some highly nonlinear geometric structures with nontrivial topology. Topological data analysis (TDA) is an emerging field whose goal is to provide mathematical and algorithmic tools to understand the topological and geometric structure of data. This chapter provides a short introduction to this new field through a few selected topics. The focus is deliberately put on the mathematical foundations rather than specific applications, with a particular attention to stability results asserting the relevance of the topological information inferred from data
- …